fix: replace manual anomalies with a hampel filter#1997
fix: replace manual anomalies with a hampel filter#1997matthewp wants to merge 6 commits intonpmx-dev:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Lunaria Status Overview🌕 This pull request will trigger status changes. Learn moreBy default, every PR changing files present in the Lunaria configuration's You can change this by adding one of the keywords present in the Tracked Files
Warnings reference
|
📝 WalkthroughWalkthroughThis PR replaces blocklist-based anomaly correction with a Hampel-filter implementation (adds Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: beb9630b-e84a-4279-a4f8-362d18f451b4
📒 Files selected for processing (5)
app/components/Package/TrendsChart.vueapp/components/Package/WeeklyDownloadStats.vueapp/utils/download-anomalies.data.tsapp/utils/download-anomalies.tstest/unit/app/utils/download-anomalies.spec.ts
💤 Files with no reviewable changes (1)
- app/utils/download-anomalies.data.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/components/Package/TrendsChart.vue (1)
1831-1838: Consider usingv-modelfor simpler checkbox binding.The explicit
:checked+@changepattern is functionally correct, butv-modelprovides equivalent behaviour with less boilerplate.♻️ Optional simplification
<input - :checked="settings.chartFilter.anomaliesFixed" - `@change`=" - settings.chartFilter.anomaliesFixed = ($event.target as HTMLInputElement).checked - " + v-model="settings.chartFilter.anomaliesFixed" type="checkbox" class="accent-[var(--accent-color,var(--fg-subtle))]" />
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 91971e88-96ef-48e4-befd-3a0865a5eb4c
📒 Files selected for processing (17)
app/components/Package/TrendsChart.vuei18n/locales/bg-BG.jsoni18n/locales/cs-CZ.jsoni18n/locales/de-DE.jsoni18n/locales/en.jsoni18n/locales/es.jsoni18n/locales/fr-FR.jsoni18n/locales/hu-HU.jsoni18n/locales/id-ID.jsoni18n/locales/ja-JP.jsoni18n/locales/pl-PL.jsoni18n/locales/ru-RU.jsoni18n/locales/tr-TR.jsoni18n/locales/uk-UA.jsoni18n/locales/zh-CN.jsoni18n/locales/zh-TW.jsoni18n/schema.json
💤 Files with no reviewable changes (16)
- i18n/locales/en.json
- i18n/locales/bg-BG.json
- i18n/locales/es.json
- i18n/locales/de-DE.json
- i18n/locales/zh-TW.json
- i18n/locales/fr-FR.json
- i18n/locales/tr-TR.json
- i18n/locales/cs-CZ.json
- i18n/locales/hu-HU.json
- i18n/schema.json
- i18n/locales/zh-CN.json
- i18n/locales/uk-UA.json
- i18n/locales/ru-RU.json
- i18n/locales/ja-JP.json
- i18n/locales/pl-PL.json
- i18n/locales/id-ID.json
|
this looks very promising! tagging @jycouet who may have some thoughts on this 🙏 |
|
Great to have a second look into this. You probably checked my initial PR, #1636 I started with hampel implementation 👍 If I remember correctly, the sweet spot was 2.5 or 3 for vite. But this setting would hide "great start" of a lib. (eg: 0 0 0 0 0 0 20000) Another note, when the start or end of the chart is in an anomalie period, any algo can't fix it. A good example is: if you start the chart in the middle of the vite spike. Maybe we should add some tests around all this ? (to keep the intent) Another note 2: I would love to have more than just vite there today ! 😅 |
|
Thanks for this! A few thoughts after pulling the branch: Reproduction URL that matters: "Great start" test cases:Packages like PR suggestionCould we expose the Hampel filter as its own (separate from manual correction), with tweakable halfWindow / threshold sliders? That way we can compare manual vs Hampel side by side and find the right balance before committing to one approach? // Happy coding |
I don't follow, as its own what? |
|
Yes I can do that. |
Skip points without a full symmetric window to avoid flattening real
growth at series edges ("great start" problem). Use a relative check
when MAD is 0 instead of flagging any deviation, so sparse packages
like [0,0,0,1,0,0,0] keep their real activity.
Expose halfWindow and threshold as sliders on a second row in the data correction panel so users can tune the filter interactively.
17e63ce to
88334f4
Compare
|
@jycouet I added sliders and addressed your concerns. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
i18n/locales/en.json (1)
469-477:⚠️ Potential issue | 🟡 MinorRename this section away from “Known anomalies”.
These strings now sit beside Hampel window/threshold controls, but they still describe a curated set of bot/CI spikes. The PR now applies automatic statistical filtering instead, so this wording misrepresents what the toggle does and how the values are changed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 848ad525-54b8-48c1-961d-cda990fce395
📒 Files selected for processing (23)
app/components/Package/TrendsChart.vueapp/components/Package/WeeklyDownloadStats.vueapp/composables/useSettings.tsapp/utils/download-anomalies.data.tsapp/utils/download-anomalies.tsi18n/locales/az-AZ.jsoni18n/locales/bg-BG.jsoni18n/locales/cs-CZ.jsoni18n/locales/de-DE.jsoni18n/locales/en.jsoni18n/locales/es.jsoni18n/locales/fr-FR.jsoni18n/locales/hu-HU.jsoni18n/locales/id-ID.jsoni18n/locales/ja-JP.jsoni18n/locales/pl-PL.jsoni18n/locales/ru-RU.jsoni18n/locales/tr-TR.jsoni18n/locales/uk-UA.jsoni18n/locales/zh-CN.jsoni18n/locales/zh-TW.jsoni18n/schema.jsontest/unit/app/utils/download-anomalies.spec.ts
💤 Files with no reviewable changes (15)
- i18n/locales/ja-JP.json
- i18n/locales/zh-TW.json
- i18n/locales/fr-FR.json
- i18n/locales/hu-HU.json
- i18n/locales/es.json
- i18n/locales/cs-CZ.json
- i18n/locales/pl-PL.json
- i18n/locales/de-DE.json
- i18n/locales/tr-TR.json
- i18n/locales/ru-RU.json
- i18n/locales/bg-BG.json
- i18n/locales/zh-CN.json
- i18n/locales/id-ID.json
- i18n/locales/uk-UA.json
- app/utils/download-anomalies.data.ts
✅ Files skipped from review due to trivial changes (1)
- i18n/locales/az-AZ.json
| hampelWindow: number | ||
| hampelThreshold: number |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "useSettings.ts" | head -5Repository: npmx-dev/npmx.dev
Length of output: 93
🌐 Web query:
VueUse useLocalStorage mergeDefaults shallow merge documentation
💡 Result:
In VueUse, useLocalStorage() uses the same options as useStorage(), and the mergeDefaults behavior is documented on the useStorage page. (vueuse.org)
mergeDefaults: truemerges the default value with the value read from storage, and for objects it does a shallow merge (top-level keys only). (vueuse.org)- For a deep merge, pass a custom function to
mergeDefaults(the docs explicitly show this pattern). (vueuse.org)
Example (custom merge = deep merge):
import { useLocalStorage } from '@vueuse/core'
// import deepMerge from '...'
const state = useLocalStorage('key', { a: { b: 1 } }, {
mergeDefaults: (storageValue, defaults) => deepMerge(defaults, storageValue),
})Citations:
- 1: https://vueuse.org/core/usestorage/?utm_source=openai
- 2: https://vueuse.org/core/usestorage/?utm_source=openai
- 3: https://vueuse.org/core/useStorage/?utm_source=openai
🏁 Script executed:
cat -n ./app/composables/useSettings.tsRepository: npmx-dev/npmx.dev
Length of output: 7350
🏁 Script executed:
# Search for migration or normalisation logic in settings/preferences
rg -i "migrat|normaliz|hydrat" --type ts --type tsx -A 3 app/composables/ 2>/dev/null | head -50Repository: npmx-dev/npmx.dev
Length of output: 43
🏁 Script executed:
# Check usePreferencesProvider.ts for migration patterns
find . -type f -name "*usePreferences*" -o -name "*preferences*" | grep -E "\.(ts|tsx)$"Repository: npmx-dev/npmx.dev
Length of output: 236
🏁 Script executed:
head -150 ./app/composables/usePreferencesProvider.tsRepository: npmx-dev/npmx.dev
Length of output: 2609
🏁 Script executed:
# Search for any settings normalisation, migration, or post-hydration logic
rg "chartFilter|hampel" --type ts --type tsx -B 3 -A 3 app/ | head -100Repository: npmx-dev/npmx.dev
Length of output: 88
🏁 Script executed:
# Check if there's any middleware or plugin that normalises settings
rg "DEFAULT_SETTINGS|STORAGE_KEY|npmx-settings" --type ts --type tsx -B 2 -A 2 app/Repository: npmx-dev/npmx.dev
Length of output: 88
🏁 Script executed:
# Search for chartFilter and hampel references
rg "chartFilter|hampel" --type ts -B 3 -A 3 app/Repository: npmx-dev/npmx.dev
Length of output: 1232
🏁 Script executed:
# Search for settings normalisation anywhere
rg "DEFAULT_SETTINGS|STORAGE_KEY.*npmx-settings" --type ts -B 2 -A 2 app/Repository: npmx-dev/npmx.dev
Length of output: 885
Deep-merge nested chartFilter settings on hydration.
VueUse's useLocalStorage with mergeDefaults: true only performs a shallow merge at the top level. Returning users with an existing stored chartFilter object will not receive the new hampelWindow and hampelThreshold fields—the entire default chartFilter is replaced, leaving the new settings unset until explicitly written. VueUse documentation recommends a custom merge function for nested defaults.
Suggested fix
if (!settingsRef) {
settingsRef = useLocalStorage<AppSettings>(STORAGE_KEY, DEFAULT_SETTINGS, {
- mergeDefaults: true,
+ mergeDefaults: (storageValue, defaults) => ({
+ ...defaults,
+ ...storageValue,
+ connector: { ...defaults.connector, ...(storageValue.connector ?? {}) },
+ sidebar: { ...defaults.sidebar, ...(storageValue.sidebar ?? {}) },
+ chartFilter: {
+ ...defaults.chartFilter,
+ ...(storageValue.chartFilter ?? {}),
+ },
+ }),
})
}| // Only evaluate points that have a full symmetric window. | ||
| // Boundary points lack enough context on one side, making them | ||
| // prone to false positives (e.g., a "great start" at the end of a series). | ||
| for (let i = halfWindow; i < values.length - halfWindow; i++) { | ||
| const start = i - halfWindow | ||
| const end = i + halfWindow | ||
| const window = values.slice(start, end + 1) |
There was a problem hiding this comment.
Boundary spikes still leak back in through date-range trimming.
This loop never scores the first or last halfWindow buckets. Because Package/TrendsChart.vue applies Hampel after fetching the user-selected range, any anomaly that becomes a boundary point after cropping is guaranteed to survive, so narrowing the chart around a spike can reintroduce the very point the default view removed. Please run the filter on padded history and trim afterwards, and add a regression for a cropped-range edge spike.
|
@matthewp I pulled your branch, and see the 2 controls And with http://127.0.0.1:3000/package/vite?modal=chart&start=2025-03-13&end=2026-03-11 I get:
I tried also, removing my local storage to start from "fresh", but I still see the big spike. I didn't look at the code at all, let me know how you want to proceed. |


🔗 Linked issue
Previous issue: #1707
🧭 Context
The current implementation of anomaly removal is biased due to the manual nature.
This replaces the implementation with one that uses a hampel filter to automatically remove deviations.
📚 Description
The important part here is
applyHampelCorrection.It goes over each data point and creates a sliding window of data points, by default 3 on each side.
Of those data points it finds the median, since a spike can't pull it off center.
Then it measures how spread out the neighbors are by calculating the MAD (see here).
Then it gives the data point a score and if it's above a threshold then it gets replaced with the median of that window and marked as being an anomaly.
Here are two articles about how hampel filters work:
Here's a screenshot of Vite which still shows its spike removed after this change.